Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve indentation when using parseDocument #338

Closed

Conversation

Nathan-Fenner
Copy link

@Nathan-Fenner Nathan-Fenner commented Nov 18, 2021

This fixes #283 (with some qualifications).

Documents like the following now get their indentation preserved:

foo0:
- a
- b
- c
foo2:
  - a
  - b
  - c
foo8:
        - a
        - b
        - c
bar:
       more:
                     indentation:
                                             is: needed

when you enable the new option preserveCollectionIndentation and also enable the keepSourceTokens option:

const document = parseDocument(yamlContents, {
  keepSourceTokens: true,
  preserveCollectionIndentation: true,
});

If you insert/edit/move around nodes in this document, or insert them into others, their indentation is also preserved.


I added the new option preserveCollectionIndentation purely as a backwards compatibility measure. It could also default to true, in which case you'd only need to enable keepSourceTokens for it to work.

Unfortunately at least with the way the parser is structured right now, we do need the source tokens since they contain the relevant indentation information that makes the preservation possible.


I added a new test similar to the example above which ensures that the document round-trips exactly, maintaining all indentation.

I also added a test that verifies all documents in the yaml-suite can be round-tripped twice to verify that this doesn't break them (not that we'd expect that).

The one special edge case is that if we have a document like

list:
- not indented
- at all

and we replace the list with an object like {greeting: "hello"} as in:

const doc = parseDocument(yamlContents);
doc.set('list', {greeting: "hello"});

console.info(doc.toString());

then we need to insert some new indentation; lists are allowed to have an indentStep of 0 since the hyphen marks that the item starts, but non-lists are not:

# right
list:
  greeting: hello

# wrong (missing indentation changes meaning of document)
list:
greeting: hello

in this case, we default back to the configured indentStep (e.g. 2).

Copy link
Owner

@eemeli eemeli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really promising! I left a couple of line comments below, but I'll need to spend a bit more time on a proper review at some point.

My main initial concerns are:

  1. Is this a good idea to include as a partial implementation first, or do we need to ensure that it also covers block sequences and flow collections?
  2. What happens when a new key/value pair is added to a map that has pairs with a set srcIndentStep? Could/should the value (also?) be set on the collection rather than the pair?

valueIndent !== undefined &&
valueIndent >= keyIndent
) {
pair.srcIndentStep = valueIndent - keyIndent
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you think this ought to be conditional on both keepSourceTokens and preserveCollectionIndentation? As the new field is added to an AST rather than CST node, is there a reason to have any connection between these options?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, for some reason in my initial tests I thought that we wouldn't have the right source indent info without keepSourceTokens enabled. So, this is totally unnecessary!

Comment on lines 116 to 118
keyIndent !== undefined &&
valueIndent !== undefined &&
valueIndent >= keyIndent
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
keyIndent !== undefined &&
valueIndent !== undefined &&
valueIndent >= keyIndent
value.indent >= key.indent

collItem has already been destructured on line 23, and there should be no case where these values are not numbers. And if there is, any >= with an undefined will fail.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TypeScript doesn't permit you to compare number|undefined values with >=, even though JavaScript has the right semantics. So we either need a cast (which seems unnecessary) or we need to explicitly include the logic to exclude undefined values.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've fixed this to avoid the as any casts; I didn't notice the destructuring in the first past. Now it's simplified (somewhat) to

const keyIndent: number | undefined =
  key && 'indent' in key ? key.indent : undefined
const valueIndent: number | undefined =
  value && 'indent' in value ? value.indent : undefined

The extra logic here is needed to convince TypeScript that it's okay:

  • key.indent is only valid if we know that "indent" is in key in the first place (and some of the variants don't include an indent field), so we have to do "indent" in key before we can write key.indent
  • 'indent' in key is only valid if key is not null/undefined, so we have to check key && first

@Nathan-Fenner
Copy link
Author

Is this a good idea to include as a partial implementation first, or do we need to ensure that it also covers block sequences and flow collections?

It handles block sequences in the sense of knowing how much indentation occurs before a - which is the common case when used as a list in a map (or a list of fields), see the updated example in the PR description.

The less common case is where you have indentation after the -, as in:

-
  -
    - foo
    - bar
  -
    - baz

Similarly, it reformats

list_map2:
  -
    foo: 1
    bar: 2
    qux: 3
  -
    foo: 4
    bar: 5
    qux: 6

into

list_map2:
  - foo: 1
    bar: 2
    qux: 3
  - foo: 4
    bar: 5
    qux: 6

Since these kinds of structures are less common (in my experience) I think it's fine as a first step to exclude them, and then revisit the approach in a later PR.


Similarly, I'm skipping over flow collections, since they're just fairly different. They have different kinds of indentation/spacing concerns so the implementation would probably be fairly different. And in my experience they're usually much smaller e.g. single-line, with little variation in how they're formatted, so they're more likely to just be preserved anyway.


What happens when a new key/value pair is added to a map that has pairs with a set srcIndentStep? Could/should the value (also?) be set on the collection rather than the pair?

My main goal in this change is to avoid making unnecessary edits to the original, rather than trying to "preserve style". So in this case, if a new "block" is made, that block will have a regular indent step (e.g. 2). So for example, if we had the following document:

count:
          animals:
                    cat: 1
                    dog: 2

if we write doc.setIn(["count", "animals", "lizard"], 3) then we get

count:
          animals:
                    cat: 1
                    dog: 2
                    lizard: 3

because the indentation before lizard belongs to the Pair with key animals. So this behaves exactly as we expect.

On the other hand, if we have

count:
        mammals:
                cat: 1
                dog: 2

and we write doc.setIn(["count", "reptiles"], {"lizard": 3, "snake": 4}); then we get

count:
        mammals:
                cat: 1
                dog: 2
        reptiles:
          lizard: 3
          snake: 4

So the yaml is still valid, but the new block is not indented as much as its sibling blocks. I think that there is a case to be made that the parent block could infer a "desirable" step for new children pairs, but as far as addressing my original goal of eliminating redundant diffs, it makes no difference. So I'm ambivalent about including this now, since it might be nice, but it would be more work and doesn't directly help my use-case.

Copy link
Owner

@eemeli eemeli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to have kept you waiting; been focusing on other projects recently.

On further thought, I think a solution is needed here for keeping all map values similarly indented when they change. As in, if this document is parsed with keepMapIndent: true:

foo: |
    bar
x: y

and then the y value is updated to 'y\nz\n', then the stringified output should be:

foo: |
    bar
x: |
    y
    z

That probably means that something like a srcIndentSep value also needs to be set on a block map during its composition.

The added test cases should also cover explicit tags and aliases, complex/explicit keys, as well as block scalars and flow collection values. All of those present occasionally interesting challenges for YAML indentation.

Comment on lines +23 to +25
if (srcIndentStep > 0 || (srcIndentStep === 0 && isSeq(value))) {
// Indentation can only be preserved if it's positive, or if it's 0
// and the item to render is a seq, since:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't quite so clear-cut, as this doesn't take into account flow sequences, explicit tags, or anchors. There's an if statement starting on line 112 that covers that case for the indentSeq: true option.

To properly support keeping block sequence indents starting with -, it may be necessary to keep track of something like a srcIndentSeq boolean, and to use that when appropriate to set the indentSeq value.

*
* Default: `false`
*/
preserveCollectionIndentation?: boolean
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
preserveCollectionIndentation?: boolean
keepMapIndent?: boolean

The verb "keep" is used more commonly here; see e.g. the keepSourceTokens option.

The consideration here is also only for maps, rather than all collections, and the name of the option ought to reflect that.

Also, these options ought to be kept alphabetically sorted.

Comment on lines +112 to +115
const keyIndent: number | undefined =
key && 'indent' in key ? key.indent : undefined
const valueIndent: number | undefined =
value && 'indent' in value ? value.indent : undefined
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would probably be clearer to skip defining these intermediate variables, and just use the conditionals in the subsequent if statement directly.

@eemeli
Copy link
Owner

eemeli commented Apr 18, 2022

Closing due to inactivity.

@eemeli eemeli closed this Apr 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Array-indent not maintained between parseDocument/toString
2 participants